More Speed and More Compression: Accelerating Pattern Matching by Text Compression
نویسندگان
چکیده
This paper addresses the problem of speeding up string matching by text compression, and presents a compressed pattern matching (CPM) algorithm which finds a pattern within a text given as a collage system 〈D,S〉 such that variable sequence S is encoded by byte-oriented Huffman coding. The compression ratio is high compared with existing CPM algorithms addressing the problem, and the search time reduction ratio compared to the Knuth-Morris-Pratt algorithm over uncompressed text is nearly the same as the compression ratio.
منابع مشابه
Pattern Matching Machine for Text Compressed Using Finite State Model
The classical pattern matching problem is to nd all occurrences of patterns in a text. In many practical cases, since the text is very large and stored in the secondary storage, most of the time for the pattern matching is dominated by data transmission of the text. Therefore the text compression can speed-up the pattern matching. In this framework it is required to develop an e cient pattern m...
متن کاملCorrection to "lossless, near-lossless, and refinement coding of bilevel images"
We present general and unified algorithms for lossy/lossless coding of bilevel images. The compression is realized by applying arithmetic coding to conditional probabilities. As in the current JBIG standard the conditioning may be specified by a template. For better compression, the more general free tree may be used. Loss may be introduced in a preprocess on the encoding side to increase compr...
متن کاملSpeed-up of Aho-Corasick Pattern Matching Machines by Rearranging States
This paper describes speed-up of string pattern matching by rearranging states in Aho-Corasick pattern matching machine, which is a kind of afinite automaton. We realized speed-up of string pattern matching using data compression. Although we obtain higher compression ratio using a finite state model, it doesn’t lead speed-up of string pattern matching. Because the pattern matching machine beco...
متن کاملByte pair encoding : a text compression scheme that accelerates pattern matching
Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is very fast and requires small work space. Moreover, it is easy to decompress an arbitrary part of the original text. However, it has not been so popular since the compression is rather slow and the compression ratio is not as good as other methods such as Lempel-Ziv type compression. In this paper, we bring ...
متن کاملSpeeding Up Pattern Matching by Text Compression
Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is very fast and requires small work space. Moreover, it is easy to decompress an arbitrary part of the original text. However, it has not been so popular since the compression is rather slow and the compression ratio is not as good as other methods such as Lempel-Ziv type compression. In this paper, we bring ...
متن کامل